Limits for compression of quantum information carried by ensembles of mixed states 

Michal HorodeckiQ 

Institute of Theoretical Physics and Astrophysics 

University of Gdansk, 80-952 Gdansk, Poland 

We consider the problem of compression of the quantum information carried by ensemble of 
mixed states. We prove that for arbitrary coding schemes the least number of qubits needed to 
convey the signal states asymptotically faithfully is bounded from below by the Holevo function 
^(.Q) ~ '^■PiS{Qi). We also show that a compression protocol can be composed with another one, 
provided that the latter offers perfect transmission. Such a compound protocol is applied to the case 
of binary source. It is conjectured to reach the obtained bound. Finally, we point out that in the 
case of mixed signal states there could be a difference between the maximal compression rates at 

J~>«. ' the coding schemes which are "blind" to the signal and the ones which assume the knowledge about 

^\ ' the identities of the signal states. 
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I. COMPRESSION OF QUANTUM INFORMATION 

One of the important problems of information theory is the compression of information. A general limit for 
compression rate of classical information is placed by the so called noiseless coding theorem y . Suppose that a source 
generates a message i with probability pi and allow to cumulate the subsequent messages into long sequences and then 
represent them (encode) as sequences of bits as economically as it is possible (economically means here that we want 
to use the least possible average number of bits per message) . The task of the receiver (Bob) is to convert (decode) 
the binary sequences into the original sequences of messages. Here we do not require perfect transmission but only 
asymptotically faithful transmission. This means that Bob may be unable to recover correctly each sequence, but the 
probability of error tends to zero if the length of input blocks tends to infinity. 

Now the noiseless coding theorem [|l| says that the necessary and sufhcient number of bits per message needed 
for assymptotically faithful transmission is equal to the Shannon entropy S = —"^^pilogpi (in this paper we use 
base-2 logarithms) of the probability distribution characterizing the source. Then this quantity says in fact how much 
information per message is actually produced by the source. Indeed, one can imagine, that after the most economical 
compression procedure, each piece of the compressed signal is now equally essential as all redundancy was removed. 
Then the size of the maximally compressed signal can be interpreted as the quantity of information contained in the 
input (uncompressed) one. 

Let us now turn to the problem of compression of quantum information which was first considered by Schumacher 
H. The messages are here replaced by quantum states Qi and the bits by qubits i.e. two-level systems. The probability 
of error is generalized to quantum case by means of the chosen measures of fidelity or distortion 1^-^ between two 
quantum states. Thus we will ask about the least number of the two level systems needed to carry the information 
assymptotically faithfully to Bob i.e. so that the average distortion between the input and output states will tend to 
zero (or the average fidelity will tend to one) in the limit of input signal block of infinite length. 

Before we review the results obtained so far, let us mention the fundamental difference between the quantum and 
classical case due to the no-cloning theorem for quantum states |q| . It was shown that the theorem is equivalent to the 
impossibility of measuring of the state parameters of single quantum system B . Then we can imagine two scenarios 
which, according to the above restriction for quantum information processing, could in principle produce different 
results 1^] . Within the first scenario, we assume that Alice does not know the identities of the particular states produced 
by the source. Then, in accordance with the no-cloning theorem, Alice has no means to get this knowledge. Thus 
the most general Alice's coding protocol amounts to performing a quantum operation (trace preserving completely 
positive map - see Appendix) [|| which depends only of the known characteristics of the source i.e. the form of the 
generated ensemble {pi, Qi}. We will call it blind coding. However if we allow Alice to know each of the produced 
states, we deal with the second scenario (arbitrary or non-blind coding), where Alice's coding amounts to replacing 
the sequences of signal states by completely arbitrary new states. It seems that in some cases it will produce more 
efficient compression than it is possible within the previous scenario. 

Let us now review the results obtained so far in the domain of compression of quantum information. For the 
ensemble of pure states, Schumacher showed [|| that, by means of blind coding, it is possible to reduce the needed 
number of qubits to the value of the von Neumann entropy of the total density matrix of ensemble g = '^^PiQi (in 
short - the von Neumann entropy of the ensemble) . The proposed coding-decoding protocol was then simplified by 
Jozsa and Schumacher (we will refer to it as to SJ protocol). To obtain the converse statement saying that this 
quantity is also necessary for faithful recovery of the signal, Barnum et. al H] considered arbitrary coding scheme. 
It turned out that even in this case it is impossible to better compress the data, so that the obtained lower bound 
applied also for the first scenario. Thus for the ensemble of pure states the problem of compression has been completely 
solved: the two scenarios give the same degree of compression, and the information per message contained in such an 
ensemble is equal to the von Neumann entropy S{q) of the ensemble. Note that this establishes a precise sense of the 
von Neumann entropy within the quantum information theory [[lOl . 

Now, the problem of ensemble of mixed states is still open. The SJ coding protocol allows to compress such an 
ensemble down to the value of its von Neumann entropy |Q,Q (see also [Q in this context), but one knows that in 
some cases the more efficient protocols are possible [| 11|- To illustrate it, let us consider the source producing with 



certainty some established mixed state. Then the ensemble has entropy greater than zero but of course it does not 
carry any information. This implies that the "information content" of the ensemble (for any of the two scenarios) 
cannot be, in general, merely a function of the density matrix of the ensemble. Instead, it must depend on the 
particular form of the ensemble. Moreover, it seems that for ensembles of mixed states the arbitrary coding could 
produce more efficient compression than the blind one. Under the consideration it is desirable to investigate the 
problem of compression of information carried by ensembles of mixed states. In particular, an important task is to 
provide some limits for the compression rates with the two types of coding. 

In this paper we provide the lower bound for the necessary number of qubits per message needed for faithful 



transmission of the quantum information carried by an ensemble of mixed states for arbitrary coding. The bound is 
equal to the function S{g) — 7^, Pi SJQi) (we will call it Holevo information) which was shown by Holevo to be an upper 
bound for accessible information ||l^,Q. In particular it implies that for the ensembles of states of disjoint supports 
the two considered types of coding produce the same result. Further we investigate the problem of composing of the 
compression protocols. We consider a class of non-blind coding protocols, which involve composition of two protocols: 
an ideal one, which amounts to replacing the input states by the new states which, partially traced, reproduce the 
former ones, and the SJ protocol (applied to mixed states). Finally, we conjecture that if the arbitrary coding 
schemes are allowed then the Holevo information is in fact equal to the minimal number of qubits needed for faithful 
transmission and the bound can be reached by means of the proposed class of protocols. 

II. COMPRESSION PROTOCOLS 

Suppose that Alice generates a signal state g^ acting on a Hilbert space TLq with probability p°. The produced 
ensemble £q — {p^, g^} has the density matrix g^ — J2iP^S^- Denote now the product g^^ (g) . . . (g) g°^ by gt, where i 
now stands for multiindex (to avoid complicated notation we do not write the index N explicitly unless necessary). 
The corresponding ensemble and state are denoted by £ and g respectively. Now Alice performs a coding operation 
over the initial ensemble £q ascribing to any input state gi a new state gi. The map gi —> gi = AA{gi) is supposed to 
be a quantum operation for blind coding or an arbitrary map - for non-blind one. In the latter case we allow Alice 
even to know which states are generated by the source, so that she can prepare separately each of the states gi for 
each i. 

The new states gi represent the compressed signal which is then flipped into the suitable number of qubits determined 
by the dimension of subspace occupied by the state g of the ensemble and sent through the noiseless channel to Bob. 
Now the states gi are to be decoded to become close to the initial states gi. For this purpose Bob performs some 
established quantum operation A which of course does not depend on i. Then the resulting states are g'^ = Asigi) 
and the whole scheme is the following 
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where gi and g'^ act on the Hilbert space ®^T-Lq while gi on the channel Hilbert space Hq- Without loss of generality, 
we can assume (as in Ref. 0]) that Hq = ®'^'Hq. As a measure of distortion characterizing the quality of the 
transmission gi -^ g\ we choose the metric induced by the trace norm. The latter is defined as 

\\A\\ = Tr|A| (2) 



with \A\ = vA^ A. Thus the trace norm of Hermitian operator is simply the sum of absolute values of its eigenvalues. 
Consequently, the distortion is defined as 

Dig,a) = \\g-a\\ (3) 

An important property of the proposed measure of distortion is the fact that it does not increase under the quantum 
operations (see Appendix). Then the average distortion D = '^iPiD{gi, g[) will indicate us the quality of the process 
of recovery of quantum information by Bob after compression by Alice. Now, for a fixed source, determined by the 
ensemble Eq one considers the sequence of coding-decoding pairs (A^,Ab) with the property that lim7v-,oo -D = 
(recall that the pair is implicitly indexed by N). Such sequences will be called protocols. 

Define now the quantity Rp characterizing the asymptotic degree of compression of the initial quantum data at a 
given protocol P by 

Rp = lim — log dim^ (4) 

Here dim^ denotes the dimension of the support of the state g given by the number of nonzero eigenvalues. The 
quantity logdim^ has the interpretation of the number of qubits needed to carry the state g undisturbed (^ is to be 
transferred by a noiseless channel). 

Now, given a class P of protocols, we define the quantity 

I-p = inf Rp (5) 

Pev 



which is equal to the least number of qubits per message needed for asymptotically faithful transmission of the initial 
signal states from Alice to Bob within the considered class of protocols (to be strict one needs I-p-\-5 qubits per message, 
where 5 can be chosen arbitrarily small). As discussed in sec. |, we are interested in two classes of protocols - the 
ones with blind and arbitrary coding schemes. Accordingly we will consider two kinds of informations - the passive 
information Ip = I-p where V is the class of protocols with blind coding and the effective information I,,, with the 
infimum taken over protocols with arbitrary coding. The effective information represents the amount of information 
which seems to be actually carried by the ensemble while the passive information Ip represents the information which 
is "seen" by the quantum apparatus which is "blind" to the signal. Although the actual information contents of the 
ensemble could be in fact lower, the apparatus cannot benefit it, as it cannot in general read the identities of the 
signal states without disturbance. In result the compression rate is restricted by the value of passive information. 
Finally it is convenient to introduce the information defect I^ = Ip — I^. This quantity says us how the ensemble 
is "unkind" to us: while carrying little information the ensemble requires to be processed as if it contained a large 
amount of information. Let us recall here that for ensemble of pure states the impossibility of reading of the input 
states does not decrease the compression efficiency and the defect is equal to zero in spite of nonorthogonality of the 
signal states [0. 

III. THE BOUND FOR EFFECTIVE INFORMATION 

In this section we will prove the main result of this paper. 

Theorem.- The Holevo information I{£o) = S{g) — J^iPi^i^i) '^^ ^^^ ensemble is the lower bound for its effective 
information: 

Iei£o)>I{£o) (6) 

Note that, since by definition we have Ig < Ip then the theorem provides automatically the lower bound for passive 
information Ip. Note also that for ensembles of pure states the Holevo information is simply equal to the entropy 
of the ensemble so that the theorem is compatible with the result of Ref. [0 (up to the measure of the quality of 
transmission). 

To prove the theorem we need the lemma saying that if the average distortion between the two ensembles is small, 
then the difference between their Holevo informations per message is also small. 

Lemma.- Let J2iPi\\9i ~ 9i\\ = e ^ 5- Then the following inequality is valid 

\I {£)- I {£')]< 2 [cN log d + vie)], (7) 

where r]{x) = — xlnx with 77(0) — 0, d = diniTig. 
Proof.- We will use the following estimate CH] 

\S{g)-S{<j)\<\\g-<j\\\ogdimn + v{\\g-'y\\) (8) 

which is valid for states g, a acting on the Hilbert space 7i, with \\g — a\\ < \. Basing on the above inequality, we 
obtain 

\S{g) - S{g')\ < N\ogd\\g-g'\\ + T^dl^* - g'\\) 
<N\ogdY,P^\\Q^-Q'^\\+^l{Y.P^\\Q-~s'^\\)^^N\ogd + ^^{e) (9) 

i i 

where we used the fact that the trace norm is convex and that the function 77 is increasing on the interval (0, 2)- 
We also have 

T.^P^\S{Q^) " S{g[)\< Y^p, [N\ogd\\g, - q[\\ + V{\\Q^ - ftll)] 

i 

<eN\ogd + r]{e) (10) 

where the concavity of the function 77 was used. Now adding the two above inequalities we obtain the desired result. 

Now we can start to prove the theorem. For this purpose let us estimate the quantity logdim^. First, it is bounded 

from below by I{£)- This follows from the obvious fact that the von Neumann entropy of a state is less than or equal 

to the logarithm of the dimension of the Hilbert space the state acts on. Now let us note |l4yi5(| that the function 



/(£) can be written as the mean relative entropy between the components gi of ensemble and the density matrix g of 
the latter 



/(£) = 5]K5(ft|g) (11) 



where the relative entropy |17| is given by 

S{g\a)^TTiglogg-gloga) (12) 

Then we can benefit the Uhlmann monotonicity theorem |1§[| which states that the relative entropy does not increase 
under the action of completely positive trace preserving map (quantum operation). Thus we obtain the inequality 

i{i) > /(f ') (13) 

as the the ensemble £' is produced by Bob's quantum operation from the ensemble £. Using the inequality ( [l3|) and 
applying the lemma we get 

log di-aig > I{£) - 2 [DN log dimTig + ri(p)\ . (14) 

Noting that I{£) = NI{£q), dividing both sides of the obtained inequality by N and taking the limit N ^ oo we 
obtain the desired result. 

Let us now summarize the idea of the proof. First, the number of needed qubits per message is bounded from 
below by I(£)/N . Now Bob obtains the final ensemble £' from the ensemble £ by means of quantum operation which 
by Uhlmann theorem can only decrease the Holevo information per message. Hence we have I{£)/N > I{£')/N. 
But from the lemma it follows that the initial and final ensembles have asymptotically equal Holevo information per 
message I{£)/N k, I{£')/N hence we obtain I{£)/N > I{£)/N in the limit of large N. Note here that if the bound 
is to be reached, then the asymptotic mean entropy of the ensemble £ per message must vanish. This follows from 
the fact that the estimate of the log dim^ by the Holevo information is not too rough only if the latter amounts to 
the von Neumann entropy. 

Finally note that for the case of blind coding the Holevo information per message must be equal for all three 
ensembles £,£,£'. In other words we can say that the Holevo information is invariant under the asymptotically 
reversible operations. The same cannot be stated for von Neumann entropy. Indeed, otherwise we would not be able 
to compress the signal more than indicated by the von Neumann entropy. However we know that it is possible e.g. 
for a particular ensemble considered in Ref. B which consists of states of disjoint support. Here the signal states then 
can be measured and replaced by pure ones. Then the entropy of the ensemble decreases to the value of its Holevo 
information. The reversal is done again by measuring the pure states and replacing them by the initial, possibly 
mixed ones. Applying the theorem we find that the passive and effective information are equal and take the value of 
the Holevo information of the ensemble. Then the information defect vanishes not only for ensemble of pure states 
but also for ensemble of mixed states with disjoint supports. 

IV. COMPOSING PROTOCOLS 

From the discussion of the previous section it follows that the entropy of the density matrix of the "intermediate" 
ensemble £ should be as low as possible. In this section we will present a particular class of non-blind protocols, which 
aim at decreasing the entropy. Namely, Alice can replace the input states gi with such new ones gi acting on larger 
Hilbert space Ti = [(^'^'Hq) (8) H' that Tr-j-cgi — gi. Then the Bob's decoding amounts to performing partial trace, 
i.e. discarding the systems described by the Hilbert space Ti' . Then the states gi can produce the density matrix g 
of lower entropy than the initial one. Clearly, the above scheme provides perfect transmission. However the matrix 
g, although of perhaps small entropy, will usually occupy larger Hilbert space than the source space. To avoid it one 
could compose the present (ideal) protocol with the SJ protocol. Then the overall scheme is the following 

Alice's SJ Bob's 

coding - ^^ ^^ ~ protocol ~ partial trace / /1 r\ 

g,, (g) . . . (g) g,^ > gi^ (g) . . . (g) g^^ — > ft,...,, > g.^^.^, (15) 



jfc 



Here ij's are multiindices of length N; gi-^g). . ■g)gi^. and g^_^ ^ act on the Hilbert space g)'^ (®^7i) while gi-^g). . .g)g 
and gi^...ik act on ®^ (((Xi^Ti) ® H'). The latter two states can be obtained from the former ones by tracing over the 
space ®^Ti' . As the used distortion measure does not increase under the partial trace operation (see Appendix), the 



average distortion produced by the composed protocol is less than or equal to the one within the "intermediate" SJ 
protocol. The latter distortion tends to zero if N is kept fixed and k tends to infinity (of course N, although fixed, 
can be chosen arbitrarily large). Then composing the two protocols we have obtained again a compression protocol. 
The result can be immediately generalized as follows. Any protocol providing perfect transmission can be composed 
with some other protocol, so that the full one is again a protocol, i.e. offers asymptotically faithful transmission. 

Turning back to the considered case, we see that since the SJ protocol compresses the signal down to the value of 
entropy of the source ensemble per message y,|If[ , the following inequality holds 

Ip < lim ^S{0), (16) 

N^oo TV 

where the infimum is taken over the states g of ensembles, which partially traced produce the input ensemble £. 

Let us illustrate the above result by means of an example of binary source, i.e. the one which generates two kinds 
of messages gl and gij with probabilities pi and P2 respectively (for convenience we will further omit the indices 0). 
Suppose that Alice replaces the single signal states by their purifications Pi = |V'i)(V'i| acting on the Hilbert space 
Hq (8) H' [Q. As the source produces only two kinds of states, the entropy of the ensemble of purifications can be 
calculated explicitly 



Sig) = H 



- [l + \/{pi " P2)^ + MP2\{lpl\ll^2)\'- 



(17) 



where H{x) = — a;logx — (1 — a;) log(l — x) is the binary entropy function. The minimal entropy is obtained if the 
overlap of i/;! and ■02 is the largest. The conditional suprcmum of the overlaps of purifications of the two states gi 
and p2 is given by the fidelity of the states U^M 



max|(^i|^2)| =F(ei,e2)= (Tr^/V^g2^/er) , (18) 

so that we obtain 



le < Srainig) = H 



2 (l + ViPi -V2Y +ipiP2F{gi,g2) 



(19) 



Now if gi and g2 have disjoint supports, then f(gi,g2) = and S^-m is equal to the Holcvo information of the 
ensemble, which is compatible with discussion in sec. [D. (and discussion in Ref. Q). Note that the presented 



protocol is performed separately on the single messages. It seems reasonable to conjecture that if the protocol was 
applied to the blocks of messages then one could reach the bound of Holevo information for general ensembles. In 
other words it is very probable that in fact Ip = I{£o) = limAr_,oo jf^i^g S{g). However, it is difficult to calculate 
the minimal asymptotic entropy per message even for the case of binary source. 

V. CONCLUSION 

In conclusion, we have considered the problem of compression of quantum information carried by an ensemble of 
mixed states. We have proved that the minimal number of qubits per message needed for asymptotically faithful 
transmission is greater than the Holevo information of the initial ensemble. We have also showed that any protocol 
providing perfect transmission can by successfully composed with another protocol. We proposed a non-blind protocol 
involving composition of a perfect protocol with the Schumacher- Jozsa one. The first stage bases on replacing the 
signal states by the new states which, partially traced, reproduce the initial ones. The proposed scheme, if applied 
to blocks of messages, is conjectured to reach the bound. Then the Holevo information would acquire the physical 
sense within the quantum information theory, being a proper generalization of von Neumann entropy to the case of 
ensembles of mixed states and representing the actual quantity of quantum information produced by a source [ pi| . 
The problem whether the passive information (equal to the number of needed qubits if the blind coding schemes 
are considered) could be sometimes strictly greater than the effective information associated with arbitrary coding 
schemes, remains open. Finally we believe that the presented results will be useful in further investigations of the 
information content of ensemble of mixed states. 
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APPENDIX A: 

Let A : S(H) -^ S(Ti) be a trace preserving completely positive map, i.e. let it be of the following form m 

A{g)=J2v.gVl (Al) 

i 

Here SCH) is the set of density matrices acting on the finite dimensional Hilbert space H, Vi^s are operators satisfying 



^j V^Vi = I. It is known that A is of the form (Al) if and only if it can be implemented by means of a unitary 
transformation over a larger system [g| 

A{g) = Trn'U{g^P)Ul (A2) 

Here P is pure state acting on the additional Hilbert space TC'; U is unitary transformation over the whole space 
Ti (g) TC'. The form ( [A^ ) justifies the fact that the completely positive trace preserving maps are identified with 
quantum operations. 

Here we will proof the following proposition 

Proposition.- The distortion D{g, a) does not increase under quantum operations i.e. we have 

D{A{g),A{a))<D{g,a). (A3) 

Proof.- In view of the form ( |A2| ) it suffices to check whether D does not increase under the three components of 
the quantum operation: unitary transformation, partial trace and the operation g ^ g ® P. As -D(p, cr) = \\g — crj 
depends only on the eigenvalues of the operator v4 = p — cr, then it is unitarily invariant. Subsequently, the operators 
A and A® P have the same positive eigenvalues, so that D{g ® P,a ® P) = D{g, a). Finally, suppose that A acts 
on the Hilbert space T-L®TL' and has the spectral decomposition A = ^^ \Pi- Let us estimate the trace norm of its 
partial trace 

W^rn'AW = II ^ A.ftll < ^ |A,| ||ft|| - ^ l^'l = H^H" (^4) 

where gi = Tr-^/Pj. Here we used triangle inequality for the norm and the fact that \\gi\\ = 1. This completes the 
proof. The proposition holds also in the case where the operation A maps S{Hi) into S{Ti2) with different Hilbert 
spaces Til and 7^2- 
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